Four datasets from the World Life Expectancy data - Income per Person, Life Expectancy, Population Size and Country Regions, contain information that potentially impacts the life expectancy of the world. The data covers relative information between 1800 and 2018. All four datasets were combined together into one dataset for this analysis.
The combined dataset contains 40437 observations and 6 variables. The variables are:
country - Name of the country (categorical)Region - The region the country belongs to
(categorical)Year - Year (categorical)Income - Income (Numerical) - in constant international
dollars.life expectancy - Number of years (Numerical)Population Size - Total Population count of a country
(Numerical)A copy of the combined dataset can be found here: https://github.com/chinwex/STA553/raw/main/w05/merged_data.csv
dt <- read.csv(file="https://github.com/chinwex/STA553/raw/main/w05/merged_data.csv")[,-1]
kable(head(dt))
| country | region | year | income | Life_Expectancy | Population_size |
|---|---|---|---|---|---|
| Afghanistan | Asia | 1800 | 603 | 28.2 | 3280000 |
| Albania | Europe | 1800 | 667 | 35.4 | 410000 |
| Algeria | Africa | 1800 | 715 | 28.8 | 2500000 |
| Angola | Africa | 1800 | 618 | 27.0 | 1570000 |
| Antigua and Barbuda | Americas | 1800 | 757 | 33.5 | 37000 |
| Argentina | Americas | 1800 | 1510 | 33.2 | 534000 |
A subset of the data containing only data for the year 2015 was
created and called dt2015. The data contains 6 variables
(country, region, year, income, life expectancy and population size) and
187 observations.
dt2015 <- filter(dt, year == 2015)
kable(head(dt2015))
| country | region | year | income | Life_Expectancy | Population_size |
|---|---|---|---|---|---|
| Afghanistan | Asia | 2015 | 1750 | 57.9 | 33700000 |
| Albania | Europe | 2015 | 11000 | 77.6 | 2920000 |
| Algeria | Africa | 2015 | 13700 | 77.3 | 39900000 |
| Andorra | Europe | 2015 | 46600 | 82.5 | 78000 |
| Angola | Africa | 2015 | 6230 | 64.0 | 27900000 |
| Antigua and Barbuda | Americas | 2015 | 20100 | 77.2 | 99900 |
myPlotlyLayout <- function(n){
layout(n,
### Title
title =list(text = "Association Between Income and Life Expectancy",
font = list(family = "Times New Roman", # HTML font family
size = 18,
color = "#009E73")),
### legend
legend = list(title = list(text = 'Region',
font = list(family = "Times New Roman",
size = 18,
color = "#009E73")),
bgcolor = "#e6ffff",
bordercolor = "navy",
groupclick = "togglegroup", # one of "toggleitem" AND "togglegroup".
orientation = "v" # Sets the orientation of the legend.
),
## Background
plot_bgcolor ='#e6ffff',
## Axes labels
xaxis = list(color = "#0072B2",
title=list(text = 'Income',
font = list(family = 'Times New Roman', size=18)),
zeroline = TRUE,
zerolinecolor = '#ccffff',
zerolinewidth = 0.5,
showgrid = TRUE,
gridcolor = '#e6f2ff'),
yaxis = list(color = "#0072B2",
title=list(text = 'Life Expectancy',
font = list(family = 'Times New Roman', size=18)),
zeroline = TRUE,
zerolinecolor = '#ccffff',
zerolinewidth = 0.5,
showgrid = TRUE,
gridcolor = '#ccffff'),
## annotations
annotations = list(
x = 0.7, # between 0 and 1. 0 = left, 1 = right
y = 0.3, # between 0 and 1, 0 = bottom, 1 = top
font = list(size = 12,
color = "#D55E00"),
text = "The point size is proportional to the Population Size",
xref = "paper", # "container" spans the entire `width` of the plot.
# "paper" refers to the width of the plotting area only.
yref = "paper", # same as xref
xanchor = "center", # horizontal alignment with respect to its x position
yanchor = "bottom", # similar to xanchor
showarrow = FALSE
)
)
}
# standardize population size between 0 and 1
pop.size <- dt2015$Population_size
# sizes <- (pop.size - min(pop.size))/(max(pop.size) - min(pop.size))
# rel.pop = pop.size/1000000
plot_ly(
data = dt2015,
x = ~income, # Horizontal axis
y = ~Life_Expectancy,
color = ~factor(region), # must be a numeric factor
colors=c("#D55E00", "#CC79A7", "#F0E442", "#0072B2", "#009E73"),
text = ~paste("Country: ", country,
"<br>Population Size: ", Population_size),
hovertemplate = paste('<i><b>Life Expectancy<b></i>: %{y}',
'<br><b>Income</b>: %{x}',
'<br><b>%{text}</b>'),
alpha = 0.5,
size = ~(3*log(pop.size)-11)^3,
type = 'scatter',
mode = 'markers',
marker = list(opacity = 0.7, sizemode = "area", sizeref = 0.3),
# width = 700,
# height = 500
width = 800,
height = 500
) %>% myPlotlyLayout()
This is a scatterplot of income and life expectancy for 187 countries in the world for the year 2015. The points on the plot are colored based on their regions (Africa, Americas, Asia, Europe and Oceania). The point sizes are proportional to the population size of each country. There appears to be a curvilinear relationship between income and life expectancy. The first part is linear and then gradually it becomes curved.
Countries in Africa and few in Asia have the lowest income and the shortest life expectancy while European and American countries have the highest income and longest life expectancy. Also, Asian countries with the largest population have their life expectancy between 60 and 70 years. The plot also shows differences in the life expectancy of many countries that have the same income such as countries in Africa, and parts of Asia and Oceania. Overall, the plot shows that life expectancy increases with higher income and decreases with lower income.
This plot shows the huge gap in life expectancy between countries with low income and countries with high income. This difference may be due to increased access to healthcare, education, good food and water for people living in countries with high income than people living in low income areas. From the plot, Japan had the highest life expectancy 83.8years with an average income of 37,800 and a population of 128,000,000. This was followed closely by Singapore with life expectancy of 83.6years, an average income of 80,900 and a population of 5,540,000.
Central African Republic and Lesotho had the lowest life expectancy of 49.7years and 49.6 years respectively, and both are countries in the African region.
This is a plot showing changes in life expectancy and income from 1800 to 2018 for countries in Africa, America, Asia, Europe and Oceania.
# standardize population size between 0 and 1
pop.size <- dt$Population_size
# sizes <- (pop.size - min(pop.size))/(max(pop.size) - min(pop.size))
# rel.pop = pop.size/1000000
plot_ly(
data = dt,
x = ~income, # Horizontal axis
y = ~Life_Expectancy,
color = ~factor(region), # must be a numeric factor
colors=c("#D55E00", "#CC79A7", "#F0E442", "#0072B2", "#009E73"),
frame = ~year,
text = ~paste("Country: ", country,
"<br>Population Size: ", Population_size),
hovertemplate = paste('<i><b>Life Expectancy<b></i>: %{y}',
'<br><b>Income</b>: %{x}',
'<br><b>%{text}</b>'),
alpha = 0.5,
size = ~(3*log(pop.size)-11)^3,
type = 'scatter',
mode = 'markers',
marker = list(opacity = 0.7, sizemode = "area", sizeref = 0.3),
# width = 700,
# height = 500
width = 800,
height = 500
) %>% myPlotlyLayout()
From the plot, the average life expectancy globally was low in the early years and has since increased over time. In 1800, the average life expectancy for most of the countries lay between 20 and 40 years with an average annual income between 300 and 5000. Over the years, there has been a gradual increase in both income and life expectancy with the highest average life expectancy seen in Japan at 84.2years in 2018 with an income of about 39,100.